This is the exploratory data analysis report for the MS capstone project sponsored by Aclima, Inc. The goal of this project is to investigate the associations of demographic factors with the concentration of air pollutants and understand the environmental inequality in Alameda County, California. The hyperlocal air quality data come from Aclima, Inc. and neighborhood-level measures of demographic factors are available from the U.S. Census Bureau.
The American Community Survey (ACS) is a demographics survey program conducted by the U.S. Census Bureau. It covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. The ACS estimates are based on data from a sample of housing units and people in the population. The survey goes to a random sample of addresses in every state, the District of Columbia, and Puerto Rico; thus, the weights do not need to be incorporated in the analysis. We use 2015-2019 ACS 5-yaer data profile at the block group level, which is the the smallest geographical unit for which the bureau publishes sample data. The 5-year estimates from the ACS are period estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups.
For each block group, I consider the following population characteristics as covariates:
In the following sections, I provide spatial representations of the data to better understand the demographics, and find clues about the spatial variation of population characteristics in Alameda county.
In order to recenter the map, click on the label on the bottom right.
Raw data provide estimates of population self-identified as non-Hispanic white alone, non-Hispanic black or African American alone, non-Hispanic American Indian and Alaska Native alone, non-Hispanic Asian alone, non-Hispanic Native Hawaiian and Other Pacific Islander alone, non-Hispanic some other race alone, non-Hispanic two or more races, or Hispanic (8 categories). These categories are mutually exclusive and collectively exhaustive. I combine small groups - non-Hispanic American Indian and Alaska Native alone, non-Hispanic Native Hawaiian and Other Pacific Islander alone, non-Hispanic some other race alone, and non-Hispanic two or more races - together into one group; non-Hispanic other. As a result, we have 5 categories in total. These are the race categoreis used in Bell and Ebisu (2012). I calculate percentage estimates of population in each of 5 categories from population estimates and visualize them at the block group level.
Figure 1. Percentage estimates of population self-identified as non-Hispanic white alone
Figure 2. Percentage estimates of population self-identified as non-Hispanic black or African American alone
Figure 3. Percentage estimates of population self-identified as non-Hispanic Asian alone
Figure 4. Percentage estimates of population self-identified as non-Hispanic other
The range of percentage estimates is relatively narrower than other categories.
Figure 5. Percentage estimates of population self-identified as Hispanic
There are 5 missing values in the data. Further investigation can be conducted on:
We certainly see that there are spatial structure in race/ethnicity at the block group level. Moreover, it is interesting to see the block group by block group variation.
The ACS data provide two types of age estimates. One is median age and the other is population in age by sex categories. The latter consists of 46 age by sex categories but I combine them into 4 age categories: 0 to 24, 25 to 44, 45 to 64, or 65 years and over. In michelle and keita (2012), they use different categories: 0 to 19, 20 to 64, or 65 years and over. Instaed of following these categories, I use 24 as a boundary since we look at the educational attainment among persons 25 or older and there are areas around the university campus. Also, I use 44 as another boundary because the interval between 20 and 64 years is too large.
Figure 6. Estimates of median age
Figure 7. Percentage estimates of population aged 0 to 24 years
Figure 8. Percentage estimates of population aged 25 to 44 years
Figure 9. Percentage estimates of population aged 45 to 64 years
Figure 10. Percentage estimates of population aged \(\geq\) 65 years
Surprisingly, there is a block group, Block Gorup 5, Census Tract 4272, where the percentage estimate of population aged 0 to 24 years is 100 % and the estimated median age is 20.8 years. It may be because the ACS estimates are based on data from a sample, not the full population and it is a small population subgroup; the population estimate is 75.
It seems there are less obvious patterns in age distributions compared to race/ethnicity. However, it is interesting to see how different distributions of nearby blcok groups in the same census tract are. For example, around 50 % of population are aged 25 to 44 years in block groups 1 and 2, census tract 4017, but it is not the case in block group 3, census tract 4017; 0 % of population is estimated in the age group 25 to 44 years.
Raw data provide estimates of population in the following categories: No schooling completed, Nursery school, Kindergarten, 1st grade, 2nd grade, 3rd grade, 4th grade, 5th grade, 6th grade, 7th grade, 8th grade, 9th grade, 10th grade, 11th grade, 12th grade (no diploma), Regular high school diploma, GED or alternative credential, Some college (less than 1 year), Some college (1 or more years, no degree), Associate’s degree, Bachelor’s degree, Master’s degree, Professional school degree, or Doctorate degree (24 categories). Following Hajat et al. (2013) where education was characterized as the percentage of persons with at least a high school degree and the percentage with at least a Bachelor’s degree for neighborhood SES index data, I calculate those two percentage estimates from our data.
Figure 11. Percentage estimates of persons 25 or older with at least high school education
Figure 12. Percentage estimates of persons 25 or older with at least a Bachelor’s degree
As we present spatial distributions of different education levels, the ranges of estimates are not similar but there is an obvious spatial pattern that there are areas with lower educational attainments than others. Connecting it to race/ethnicity distributions, they are approximate areas where the majority of population are Hispanic. We also see higher proportions of population in Berkeley area having at least high school and Bachelor’s degree compared to other areas.
I extract median household income and population estimates data in 16 income categories from the ACS variables. Considering the estimate of median household income in Alameda County is $99,406 (margin of error $921) from 2015-2019 ACS 5-yaer data profile, I consider to use percentage estimates of households with $100,000 or more income.
Figure 13. Estimates of median household income
Figure 14. Percentage estimates of households with $100,000 or more income in the past 12 months
By visualizing all these variables, we can see not only spatial variations in the county but also correlation relationships between variables.
Several studies have included poverty level and employment related variables to examine associations between air pollution and SES (Bell and Ebisu 2012; Hajat et al. 2013). However, those variables are not provided at the block group level.